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Abstract 

> 

' We consider the minimization of the number of non-zero coefflcients (the £o "norm") of the rep- 

I resentation of a data set in terms of a dictionary under a fidelity constraint. (Both the dictionary 

l/^ . and the norm defining the constraint are arbitrary.) This (nonconvex) optimization problem naturally 

' leads to the sparsest representations, compared with other functionals instead of the £o "norm". 

Our goal is to measure the sets of data yielding a .ff-sparse solution — i.e. involving K non-zero 
■ components. Data are assumed uniformly distributed on a domain defined by any norm — to be chosen 

QQ ' by the user. A precise description of these sets of data is given and relevant bounds on the Lebesgue 

, measure of these sets are derived. They naturally lead to bound the probability of getting a Jf-sparse 

solution. We also express the expectation of the number of non-zero components. We further specify 
these results in the case of the Euclidean norm, the dictionary being arbitrary. 

X" 

: 1 Introduction 

1.1 The problem under consideration 

Our goal is to represent observed data d £ M.^ in a economical way using a dictionary {4'i)iei on M^, 
where / is a finite set of indexes and 

spanjV'i : i e /} = M^. (1) 

We study the sparsest representation where the (unknown) coefficients {Xi)ii=i are estimated by solving 
the constraint optimization problem (Vd) given below: 

minimize(A^)iei4((^i)ie/),i 
under the constraint 



i-Pd) 



lei 



< r, (2) 
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with 

4((A0.e/) ='#{*e/: A, ^0}, 

where ^ stands for cardinahty, is an arbitrary norm and t > is a fixed parameter. Let us emphasize 
that for any d G M^, the constraint in (Vd) is nonempty thanks to ([1]) and that the minimum is reached 
since £o takes its values in the finite set {0, 1, . . . , #/}■ 

Given the data d, the norm ||.||, the parameter r and the dictionary, the solution of (Vd) is the sparsest 
possible, since the objective function £o in ([2]) minimizes the number of all non-zero coefficients in the set 
without penalizing them. 

The function £q is sometimes abusively called the ^o-norm. ft can equivalently be written as 

^(p(A,) where (p{t) = I ^ |[ ^ ^ q 

The function ip is discontinuous at zero and C°° beyond the origin, and has a long history. It was used in 
the context of Markov random fields by Geman and Geman 1984, cf. [8J and Besag 1986 [T] as a prior in 
MAP energies to restore labeled images (i.e. each A,; belonging to a finite set of values): 

= II E ^'^^^ - 4l - A,), (4) 

where the last term in ^ counts the number of all pairs of dissimilar neighbors i and j, and /3 > is 
a parameter. This label-designed form is known as the Potts prior model, or as the multi-level logistic 
model [21 [11]. Guided by the Minimum description length principle of Rissanen, Y. Leclerc proposed 
in 1989 in [TU] the same prior to restore piecewise constant, real-valued images. The hard-thresholding 
method to restore noisy wavelet coefficients, proposed by Donoho and Johnstone in 1992, see [6], amounts 
to minimize for each coefficient a function of the form \\Xi — g^Hj + l3Lp{\i) where the noisy coefficients 
read gi = {^*,d), ^ I where is a wavelet basis. Very recently, the energy ^ was successfully 

used to reconstruct 3D tomographic images by using stochastic continuation by Robini and Magnin [19j . 
Let us notice that even though the problem {Vd) in ^ and the minimization of E in @ are closely related, 
there is no rigorous equivalence in general. 

The context of digital image compression is of a particular interest, since it is typically the problem we 
are modeling in the paper. In compression, one considers different classes of images. Those digital images 
live in and are obtained by sampling an analogue image. Their distribution in is one of the main 
unknown in image processing and, in practice, we only know some realizations of this distribution (i.e. 
some images). Given this (unknown) distribution, the goal of image compression is to build a coder (that 
encodes elements of M^) which assigns a small code to images. Typically, we want for every image d € M.^ 

V {length{code(d)) = K) 

to be as large as possible for K small, and small for K large. We also want the decoder to satisfy 
decode{code{d)) d. 
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The link with the problem [Vd), in ©, is that the current image compression standards (JPEG, 
JPEG2000) encode quantized versions of the coordinates of the image in a given basis. Moreover, most of 
the gain is made by choosing a basis such that the number of non-zero coordinates (after the quantization 
process) is small (IHIHO])- That is, we want to solve {Vd) for each \i belonging to a finite set of values 
and for a basis {^pi)i(=i. This link between image compression and (Vd) might seem restrictive when we 
only consider a basis. It makes much more sense when we consider a redundant system of vectors {'4>i)i^i. 
The use of redundant dictionaries has known a strong development in the past years, see [H \17\ [181 E] 
for the most famous examples. In the context of dictionaries, we know that the length of the code for 
encoding (Ai)ig/ is in general proportional to io{{Xi)iei)- The problem (Vd) therefore reads : minimize 
the codelength of the image while constraining a given level of accuracy of the coder. This is exactly the 
goal in image compression. 

Finding an exact solution to (Vd) in large dimension (which is necessary in order to apply {Vd) to image 
compression) still remains a challenge. In fact, the methods described in [?1[TS1[3] can be seen as heuristics 
approximating (Vd)- The links between the performances of those heuristics and the performances of (Vd) 
is not completely clear. It is also a goal of the paper to provide a mean for comparing those algorithms. 

1.2 Our contribution 

In this paper, we estimate the ability of the model (Vd) to provide a sparse representation of data which 
follows a given distribution law. The distribution law is uniform in the 0-level set of a norm fd '■ 

CfM^{weR''jdiw)<e}. 

In order to do this we 

• Give a precise (and non redundant) geometrical description of the sets 

r (K) ^{de R^,val(Pd) < K} , 

and 

{K) = {deR^,v&l{Vd) = K} (5) 

where val('Pd) denotes ioiiK)iei) for a solution (Ai)ig/ of {Vd) and for A' = 0, . . . , A^, r > 0. This is 
done in Theorem [T] and equation ([M]). 

Remark 1 It is easy to see that ^ipi : Ai ^ for {Xi)ii=i solving (T'd)! forms a set of linearly inde- 
pendent vectors. Therefore for all d £W we will find a solution with at most N nonzero coefficients, 
even if the size of the dictionary is huge, ^ N . So in this work we consider solutions with 
sparsity K < N . 

• Once these sets are precisely described, we are able to bound (both from above and from below), 
their measure (more precisely the measure of their intersection with Cf^{9)). The difference between 
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the upper and the lower bound is neghgible when compared to (^) , when ^ is mall enough. 



Moreover, these bounds show that the measures of Z'^ (A')n£/j(0) and {K)C\Cf^{9) asymptotically 
behave like 



as ^ goes to 0. 

The constants Ck are defined in (|44ll . They are made of the sum of constants Cv over all possible 
vector subspaces V of dimension spanned by elements of the dictionary {ipi)i^i. The constants 
Cv are built in Proposition [T] and Corollary [T] They have the form 



where Py^ is the orthogonal projection onto the orthogonal complement of V, \\.\\ is the norm 



• Once this is achieved, we easily obtain lower and upper bounds for P {va,l{Vd) < K), P {va\{'Pd) = K) 
when d is uniformly distributed in Cf^{6) (see Section IH]). They have the same characteristics as the 
bounds described above (modulo the disappearance of 0^). In order to obtain sparse representations 
of the data, we should therefore tune the model (the norm ||.|| and the dictionary (?/'i)ig/) in order 
to obtain larger constants Ck- 

This result clearly shows that the model {Pd) benefits from several ingredient (which might not be 
present in other models promoting sparsity): 

— the sum defining Ck is for all the possible vector subspaces of dimension K spanned by elements 
of the dictionary (ipi)i^i. 

— the term li^ (F n £/^(l)) in in the constants Cy represents the measure of the whole set V fl 



• Finally we estimate E{val(Vd)) and show that its asymptotic (when ^ goes to 0) is governed by 
the constant Cn-i (sec Theorem [5|). Increasing this constant therefore seems to be particularly 
important when building a model (Vd) (i-e. choosing ||.|| and {ipi)i(zi). 

These results are illustrated in the context of particular choice for ||.|| and for fd in Section [T] 
1.3 Relation to other evaluations of performance 

Evaluating the performance of an optimization problem like {Vd) for the purpose of realizing nonlinear approximation 
is a very active firld of research. For a good survey of the problem we refer to (5- 

In that field of research a variant of {Vd), named "best K-term approximation", is under study. It 
consists in looking for the best possible approximation of a datum d G using an expansion in {tpi)i^i 
with K non-zero coordinates. The performance of the model is estimated using the quantity 




Cy ^ L"-" {Py. (£11.11 (1)) )L^vn Cj, (1)) , 




denotes the Lebesgue measure of a set living in M*^. 




<JK{d) 
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where denotes the union of all the vector spaces of dimension K spanned by elements of {'ipi)i^i, for 
K = 0, . . . , N . Expressed with our notations, the typical object under consideration ij^ 

N 
K=l 

for C > and a > and (K) defined by That is the data d obeying 

<yK{d)<-^ ,foralli^ = l,...,iV. 
The typical results obtained there take the form 

^"(Ci) c /C,; c ^"(C2), (6) 

for C2 > Ci > and the level set 

/C„ - {d e K^, < 1}, 

for a norm ||.||^ characterizing the regularity of d (again, the theory is in infinite dimensional vector spaces). 
This permits to estimate the number of coordinates which are needed to represent a datum d, if we know its 
regularity. Typically, the link between a and 77 says how good is the basis (or more generally a dictionary) 
at representing the data class. 

The clear advantage of these results over ours is that they apply even if one only has a vague knowledge 
of the data distribution. For instance, any data distribution whose support is included in /C,, does enjoy 
the decay . The inclusions in ([6]) need indeed to be true for the worse elements of /C^ (even if they are 
rare). The counterpart of this advantage is that the constants Ci and C2 might be pessimistic. 

Finally, as far as we know, the analysis proposed in Nonlinear approximation does not permit (today) 
to clearly assess the differences between [Vd) and its heuristics (in particular Basis Pursuit Denoising 
[3] and Orthogonal Matching Pursuit 18 ). This is a clear advantage of the method for assessing model 
performances proposed in this paper. Indeed, similar analysis have already been conducted in [16[ I13[ 115] 
in the context of the compression scheme described in [Ti], Basis Pursuit Denoising and total variation 
regularization. (However, concerning the papers on Basis Pursuit Denoising and the total variation regu- 
larization, the results are stated for another asymptotic and the analysis partly needs to be rewritten in 
the proper context.) 

1.4 Notations 

For any function / : K, and any 6* G R, the 6'-level set of / is denoted by 

Cf{9) = {w^M.''j{w)<e}. (7) 

For any vector subspace V of M^, we denote Py the orthogonal projection onto V and by the orthogonal 
complement of V in R^. To specify the dimension of V , we write dim(V"). The Euclidean norm of an 

'^In Nonlinear approximation authors usually consider infinite dimensional spaces. 
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u e is systematically denoted by ||w||2- The notation ||u|| is devoted to a general norm on K^. For 
any integer K > 0, the Lebesgue measure on is systematically denoted by L'^ (.), whereas Ik stands 
for the K X K identity matrix. We write P (.) for probability and E (.) for expectation. 
As usually, we write o{t) for a fmiction satisfying limt_+o — 0. 

For any d E M.^ , we denote va.\{Vd) the value of the minimum in (Vd) — i-e. £o {{^i)iei) for (Ai)ig/ 
solving {Pd). 

2 Measuring bounded cylinder-like subsets of 
2.1 Preliminary results 

Below we give several statements that will be used many times in the rest of the work. 
Lemma 1 For any vector subspace V C M.^ and any norm \\.\\ on M.^ , define the application 

h:V^ R 

u -> h{u)=inii^t>{):^ePv^m,\\{l))Y 

Then the following holds: 
(i) For any t >0, we have 

U{r)^Py^ (^ll.llW)- 
(a) The application h in 1^ is a norm on 1^^. 

(Hi) For any norm fd on R^, let (5i > 0, ^2 > and A be some constants satisfying 

weR"" ^ fd{w) < SiWwh and l|w;||2 < (^21^11, 
A 6^62 

The constants Si , 82 and A > are independent of V and we have 

fd{u) < Ahiu), yueV^, (12) 
||u||2 < 62h{u), VueF^. (13) 

Remark 2 The constants in ilO\) come from the fact that all norms on a finite- dimensional space are 
equivalent. In practice we will choose the smallest constants satisfying these inequalities. 

Proof. The case V = {0} is trivial (we obtain h = ||.||) and we further assume that dim(t^) > 1. 

Assertion (i). The set Pv^{C\\,\\{\)) is convex since ||.|| is a norm and Py^ is linear. Moreover, the origin 
belongs to its interior. Indeed, there is e > such that if u> G R^ satisfies ||ti;||2 < £, then \\w\\ < 1. 
Consequently £ Int(£|| 112(e)) C >C|| ||(1). Using that ||.||2 is rotationally invariant and that Py± is a 
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(8) 



(9) 



(10) 
(11) 



contraction, we deduce that G lnt(^Py±{C\\ y{e)}) C Py^ (£||.|| (1)). Then the appHcation h : V-^ K 
in ((8]) is the usual Minkowski functional of Py± (£|| ||(1)), as defined and commented in V2, p.l31]. Since 
Py± (£||.||(1)) is closed, we have 

Pv± (/:||.||(1)) --{ueV^ : h{u) < 1). 
Using that the Minkowski functional is positively homogeneous — i.e. 

h{Tu) = Th{u), Vr > 0, 

lead to dSD. 

Assertion (ii). For /i to be a norm, we have to show that the latter property holds for any A £ R (i.e. 
that h is symmetric with respect to the origin). It is true since, for any A G M 

h{\u) = inf > : Aw e Fy_L (£||.||(t)) } 

= inf{t>0:7.eFv. (^£||.||(^)^ } 

= |A|inf {i > : u e Pv-^ ('C||.||(t)) } (writing i for t/|A|) 

= Wh{u), 

where we use the facts that Py± is linear and that ||.|| is a norm. It is well known that the Minkowski 
functional is non negative, finite, and satisfiej^ h{u + v) < h{u) + h{v) for any u,v £ V^. 
Finally, since £,.(0) = Pv± (£|i.|i(0)) = {0}, 

h{u) = ^ u = 0. 

Consequently, h defines a norm on V-^. 

Assertion (Hi). Let us first remark that 

£||.||(l)c/:||.||,(<52)c£/,(M2) = £/.(A), 
where Si and ^2 are defined in the proposition. Using that ||.||2 is rotationally invariant, we have 
£,(l) = Pv.. (£|,.||(1)) c Pv.{C„,{S2))=C„,{S2)nV^ 

We will prove (I12p and jointly. To this end let us consider a norm g on and 5 > such that 

Pv^ (£11.11(1)) c/:g(<5)ni/^. (14) 



■^For completeness, we give the details: 

h{u + v) = inf {< > : (w + f) e P^i (£||,||(t)) } 



< inf > : « 6 Pv± (.C||.||(t)) } + inf {t > : D e Pv± (£||.||(i)) } = h{u) + h{v). 
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Using that each norm can be expressed as a Minkowski functional, for any u G we can write down the 
foUowing: 

g(u) = inf{t > : g(^) < 1} 

= inf{< > : < 6} 

u 

= Smf{t>0:g{-)<d} (write t for |) 

7/ 

= Smf{t>0: J e Cg{S)} 

< Smf{t>0:^ ePv^{C„{l)}) (15) 

< S h{u), 

where the inequality in comes from HH). 

If we identify g with fii and 5 with A, we obtain (|12p . Similarly, identifying g with ||.||2 and (5 with 62 
yields p^. This concludes the proof. □ 



The next proposition addresses sets of bounded with the aid of fd- 
Proposition 1 For any vector subspace V ofM.^, any norm \\.\\ on M.^ and any t > 0, define 

^V + Py^iq^ir)). (16) 

Then the following hold: 
(i) is closed and measurable; 

(ii) Let fd be any norm on R^, h : R the norm defined in Lemma[ll K = dim(l^) and Sy be any 

constant such that 

fd{u)<6^h{u), VueV^^. (17) 

V 0> 5yT, then 

CT^-^{e-5^Tf <t: {v^ c^Cf,{e)) <CT^'^{o + s^Tf, (is) 

where 

C = L"-'^(Pv.. (£11.11(1))) L"(Fn£/,(l)) G (0,+oo). (19) 

Remark 3 Using Lemma[l\ the condition in JiTj ) holds for any dv > Sy with Sy £ [0, A], where A is 
given in (jlip . Let us emphasize that 6y may depend on V (which explains the letter "V" in index). The 
proposition clearly holds if we take Sy = A — the constant of LemmaUl assertion (Hi), which is independent 
of the choice of V . 

Observe that C is a positive, finite constant that depends only on V , \\.\\ and fd- 
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Remark 4 An important consequence of this proposition is that asymptotically 



L" {v^ n Cf, {0}) = C0''Q + e^a((^y') ^f 



N-K 



Proof. The sets F and Pyi (£||.||(t)) are closed. Moreover, and Py^ (£||h(t)) are orthogonal. Therefore 
is closed. As a consequence is a Borel set and is Lebesgue measurable. 
Since the restriction of fd to is a norm on , there exists Sy such that (see Remark [3]) 



fd{u)<5^h{u), Mu&V' 



(20) 



where h is given in ([9]) in Lemma [T] By ([72]) in Lemma [H such a 5v exists in [0, A]. To simplify the 
notations, in the rest of the proof we will write 6 for 5^ . 
For any u £ and v , using (|20p we have 

fd(v) - < fd{v) ~~ fd{u) < fd{u + v)< fd{v) + fd{u) < fdiv) + 5h{u) 

In particular, for h{u) < r, we get 

fdiv) - St < fd{u + v) < fdiv) + St. (21) 

As required in assertion (ii), we have 6 — 6t > 0. If in addition v E V is such that fd{v) < ^ — St, then 
fd{u + v) < 6. Noticing that 

^fA^) = {u + v:iu,v)e {V^x V), fd{u + v)< O), 

this implies that 

Ba''= {u + v:{u,v)e{V^xV), h{u) < t, fd{v) < 9 - 6t] C V^nLfM- 

Using that fd{u + v) < 9 (see the set we wish to measure in (IT51) '). then the left-hand side of (|2ip shows 
that fd{v) < 9 + 6t, hence 

Bi'^= {u + v : iu,v) e (V-^xV), h{u) < t, fd{v) < 9 + St} D V''nCf^{9). 

Consider the pair of applications 

^o:Ch{l)x{VnCf,{l)) ^ 

(u, v) — > Tu + {9 — St)v 

and 



„N 



v^i : A.(i) X (Fn/:/,(i)) - 

(u, v) — s- Tu + (9 + St)v 

Clearly, ipi is a Lipschitz homeomorphism satisfying ipi(^Ch{l) x (V H £/^(l))^ — Bi for i £ {0, 1}. 
Moreover, we have 



T In-k 






- St)Ik 



and Dipi = 



tIn-k 
(9 + St)Ik 
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Then i^Bi) can be computed using (see [7] for details) 



K [B,) = / / lip^\dvdu, 

JueChW JvevnCf^ii) 
where is the Jacobian of tpi, for i = or i = 1. In particular, 

l^il-det {D^,)=T^-^i0 + ST)^. 

It follows that 

L" (Bo) = Cr^-'^ie - St)'' and L" (B,) = Cr^-^(0 + 5t)^ 

where the constant 

C = du I dv 

= h^'-^Py. {£„{!))) h^v nCfAi))- 

Clearly C is positive and finite. Using the inclusion B^ C f) Cf^{9) C Bi shows that 

C7r^-^(0-<5r)^ < L" (F" n (0)) < C7r^-^(0 + (5t)^. 
The proof is complete. □ 



2.2 Sets built from a dictionary 

With every J C /, we associate the vector subspace 7j defined below: 

r7''=^'span((^,)jej), (22) 

along with the convention span(0) = {0}. Given an arbitrary r > 0, we introduce the subset of 

T;'^=^'r; + F^.(£||.||(r)), (23) 

where we recall that Tj- is the orthogonal complement of 7j in and is any norm on W . These 
notations are constantly used in what follows. 

The next assertion is a direct consequence of Proposition 1. The proposition is illustrated on Figure [1] 

Corollary 1 For any J G I (including J — 9), any norm ||.|| and any r > the following hold: 

(i) TJ is closed and measurable; 

(ii) Let fd be any norm on and K '= dim(T/). Then there exists 6j G [0, A] (where A is given in 
Lemma\^iii)) such that for 9 > 5jt we have 

CjT^-''{e~5jT)'' < h"{TJnCfM) < a/T^-^(0 + 5/r)^, (24) 

10 



Figure 1: Example in dimension 2. Let the dictionary read {V'lj "021 V'3i V'4}- On the drawing, the sets 
P-j-i^ (£|IH (t)), for i — 2,3,4, are shifted by an element of 7{i}. The dotted sets represent translations of 

£||.||(t). The set-valued function T'^ (), as presented in ([5^ and Proposition [31 gives rise to the following 
situations: (0) = Ci\\{t) = , (1) T^^y U T^^y U T^^^ and (2) = = T^^^^^ = T^^ .^^ = ... The 
symbol d is used to denote the iDOundaries of the sets. 
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where 

C^ = L"-"(F^^. (£[1.11(1))) L"(r,n e (0,+oo). (25) 

Proof. The corollary is a direct consequence of Proposition[TJ Notice that we now write 5j for the constant 
5tj in Lemma [TJ 

□ 

It can be useful to remind that A is defined in Lemma [1] and only depends on ||.|| and Jd- 

A more friendly expression for TJ is provided by the lemma below. Again, the lemma is illustrated on 
Figure [1] 

Lemma 2 For any J C I (including J = 0j, any norm \\.\\ and t > let TJ he defined by i23\) . Then 

rj = r, + /:||.||(T). 

Proof. The case J = is trivial because of the convention span(0) = {0}. Consider next that J is 
nonempty. Let w G TJ ^ then w admits a unique decomposition as 

w = V + u where v & Tj and u G Tj' . 

If ||7i|| < T then clearly w <E Tj +£|| ||(t). Consider next that > r. From the definition of TJ , there 
exists Wu £ >C||.||(t) such that PtJ-{wu) — u. Noticing that u — Wu ~ Px/i'^^u) — Wu £ Tj and that 
w + w — G 7j , we can see that 

w = {v -\- u — Wu) + Wu 

e Tj +^ii.||(t). 

Conversely, let w G 7j + £|| ||(t). Then 

w = vi + V where wi G 7j and w G >C||.|| (t). 
Furthermore, v has a unique decomposition of the form 

V = V2 + u where dTj and u G T^ . 

In particular, 

u = Pr^{v)£P^^ (^ll.llW) 
Combining this with the fact that v\-V vi £ Tj shows that w = [vi + W2) + ti G TJ . □ 
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Figure 2: Example of an intersection in dimension 3. 7^^ 2} is in between to planes, parallel to ^1.2}- 
Same remark for 7^^ The set 7^"^ 2} ^ 4} is of the form W + P^r±Cg{T), where ^ is a norm and for 
W = 7'{i_2} n T{-3 4}. We also have dim(T{i_2} n 7'{3^4}) < dim{T[i 2}) = d.im{T^3^4y). 

3 The intersection of two cylinder-like subsets is small 

This section is devoted to prove quite an intuitive result on the estimate of the intersection of two sets TJ . 
It uses all notations introduced in and is illustrated on Figure [H 

Proposition 2 Let Ji C / and J2 C I be such that Tji 7^ Xj2 o-'^'d dim(T/j) = dim(7j2) =^ K. Let r > 
and 6 > 0. Then the set given below 

T^nTlnCfM (26) 

is closed and measurable. Moreover, there is a constant Sj-^j^ £ [0, 3A] (where A is given in Lemma\^iii)) 
such that for 6 > 5j-^j^t we have 

L" {Tj\ n n £f, {e)) < Qj^j^r'^'^e + 5j,j,t)\ k = dim {Tj, n Tj,) , 

where the constant Qjij2 I'sads 

Qj,j, L"-'= {w^ n /:||.||, (2<52)) i: {wnCf, (i)) for w = Tj, n t>, . (27) 

Notice that Qj^ja depends only on (V-'j)jeJi and (V-'j)ieJ2i ^^d the norms ||.|| and fd- A tighter bound 
can be found in the proof of the proposition (see equation (I37p ). The bound is expressed in terms of a 
norm g constructed there. 
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Remark 5 Since k = diniM^ 1^ K — 1, we have the following asymptotical result: 



qN, 



T 



o - as - ^0 



Proof. The subset in ([^5]) is closed and measurable, as being a finite intersection of closed measurable sets. 
Let 

hi : 7}j ^ R and /12 : 7}^ ^ M 
be the norms exhibited in Lemma [T] — see equation ^ — such that for any t > 0, 

UAr) = Pt^^ (^II.IiW) and Cn^r) = Pr^^ (^||.||(t)) • 
Reminding that by definition 

De Morgan's law shows that 

Below we express the latter sum as a direct sum of subspaces: 

p^^ = (r^>T^-^) © (r^inTj,) (28) 



Notice that we have 



. ( 

r,-; = (-r,-;nT,t)efenTj,), 



as well as 



(29) 



(30) 



^From ([^5]) . any u G has a unique decomposition as 

w = wi + 7/2 + W3 where S n T/^ (31) 

Using these notations, we introduce the following function: 

u g(w) = sup {/ii(ui + U2), /i2(wi + U3)}. (32) 

In the next lines we show that 5 is a norm on W^: 
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• hi and h2 being norms, g{Xu) = \X\g{u), for all A G M; 

• if g{u) = then ui + U2 = ui + = 0; noticing that ui_Lu2 and that ui_Lu3 yields u = 0; 

• for M e W-^ and v G W-^ (both decomposed according to (pij) ). 

g{u + v) = sup {hi{ui + U2 + vi + V2),h2{ui + Us + vi + V3)} 

< sup + U2) + hi{vi + V2),h2{ui + U3) + h2{vi + V3)} 

< SUp{/ii(mi +U2),h2{ui +U3)} +SUp{/li(ui +V2),h2{vi +U3)} 

= 9{u) + g{v). 

Furthermore, g can be extended to a norm g on M.^ such that Vu G M^''^, we have — g{u) and 

^9W=^v^-(^§W), Vr>0. (33) 

Let us then define 

= W + Pw^{Cg{T)) 

= jw + w : (u,w) e (VK-^x VK), g(u) < r|. (34) 
We are going to show that (T^ n 7}^) C . In order to do so, we consider an arbitrary 

veTJ^f^rJ^. (35) 

It admits a unique decomposition of the form 

V — W + Ui + U2 + 

where w , and mi, U2 and M3 are decomposed according to (|3T|) . The latter, combined with ([29]) and 
(|30p shows that 

Ml + U2 e Tj^ and + U3 e T/^ , 
Ml + M3 e 7})^ and w + U2 ^Tj^- 

The inclusions given above, combined with psp . show that 

fti(Mi + U2) < r and /i2(mi + M3) < r. 

By the definition of g in ((3T|) -([32 |) . the inequalities given above imply that g{u) < r. Combining this with 
the definition of in ([M)) entails that w G . Consequently, 

[Tj\ n 7X ) c and (ti n r/^ n Cf, (9)) c (i^^ n Cf, (e)) . 

It follows that 

L" {Tj\ n T,; n Cf, (9)) < L" {w^ n (0)) . 
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Applying now the right-hand side of p8)) in Proposition [T] with in place of and taking Sj-^^^ such 
that 

fdiu)<5j,^,g{u), VueW^, (36) 

leads to 

where it is easy to see that 

Q'j^j^ = L"-^ (P^. (/:^(r)) {W n (1)) = L"-*"' (£,(1))L'= {W n (1)) . (37) 

In order to obtain ([27]) . we are going to show that Cg{l) C {^C\\,\y^{252) n M^^). Using Lemma [T] (ii), if 
u e W-^ is decomposed according to (PT|) . we obtain 

Il"ll2-(hl|l^+ 1^211^ +11^311^)^ < ||2ui+M2+^/3l|2 

< + U2II2 + + U3II2 

< (52^l(wi + U2) + <52^2(wi + U3) 

< 2^25(")- 

So -Cg(l) C (2(52) n ly^) and Q'j^j^ < Qjij2i for Qji:J2 ^ given in the proposition. 

At last, we need to build a uniform bound on Sj-^j.-^ giving rise to p6p . Using Lemma [T] (ii), if u e W'^ 
is decomposed according to (|3T|) . we obtain 

fd{u)=fd{ui+U2+U3) < /d(2ui + U2 + U3) + /d(ui) 

< /£i(ui + "2) + fd{ui + U3) + ,fd{ui) 

< A/ii(ui + U2) + A/i2(ui +M3) +'5i||wi||2. (38) 
Using ([in)) . ||ui||2 satisfies the following two inequalities 

||wi||2 < ||W1 + U2II2 < '52/ll(wi + U2), 
IIM1II2 < IImi + U3II2 < '52^2(wi + "3)- 

Adding these inequalities, we obtain 

5i||wi||2 < y {hi{ui + U2) + h2{ui + U3)). 

Using ([38)). we finally conclude that, for u e 

3A 

/^(m) < — (/ll(wi + U2) + /l2(ui + U3)) 

< 3A(7(w). 

The proof is complete. □ 
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4 Sets of data yielding i^'-sparse solutions or sparser 



For any given K ^ {0, . . . , iV} and r > 0, we introduce the subset I'^ [K] as it follows: 

X" (i^) =^ {d e : va\[Vd)<K]. (39) 

All data belonging to I'^ [K] generate a solution of {Vd) — see ([2]) — which involves at most K non-zero 
components. 
Let us define 

Gk"^^ {J <:il ■dimiXj) <K], (40) 

and remind that Tj — span according to (p2)) . 

The next proposition states a strong and slightly surprising result. 

Proposition 3 For any K G {0, . . . , , any norm ||.|| and any r > 0, we have 

T{K)= y T; + £||.||(r). 

JeGK 

Some sets X'^ (if), as defined in ([55)1 and explained in the last proposition, are illustrated on Figure [1] 
Proof. The case K — {) \s trivial (Go = {0}) and we assume in the following that K > 1. 

Let d e I"^ (K). This means there is {Xi)i^i — a solution of (Vd) — that satisfies io{{K)iei) ^ Hence 

d = ''^^ Xiipi + w with w G £|| ||(t) 

iG J 

and J = {i e I : \, ^0} with # J < K. 

Consequently dim(T/) < 7^ J < K, which implies that d G ^jeGaXj + -^IMI (''')• 

Conversely, let d G UjgG^-7j + £|| ||(r), then d = v + w where v G Uj^Gk'^J ^-i^d w G £|| ||(r). Then: 

• 3 J C / such that v ^Tj and the latter satisfies dim(7j) < K\ 

• there are real numbers involving at most dim(7j) non-zero components (hence < 
dim(7j) < K) such that v — ^ii^i- 

• w € £||.||(r) means that ||t«|| < r. 

It follows that d = J2teJ ^^'^^ +w el^ (K). □ 



Given J C /, remind that Tj = span ((?/'j)jgj) — see ((22|) . Since is a general family of vectors, 

there may be numerous subsets J„, n — 1,2, . . ., such that Tj^ — Xj^ and J„ 7^ J^. A non-redundant 
listing of all possible subspaces 7} when J runs over all subsets of / can be obtained with the help of the 
notations below. 
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For any if = 0, . . . , iV, define J{K) by the following three properties: 

' (a) J{K) C{J Cl: dim{Tj) = K}; 

(6) Ji, J2 e J{K) and Ji ^ J2 T/i 7^ Tj,; 

(c) J{K) is maximal: 

if Ji C / yields dim(r/i ) = K then 3 J e J{K) such that Tj T/^ . 



(41) 



Notice that in particular, J{Q) = {0} and if^J{N) = 1. One can observe that Gk, as defined in (|40p . 
satisfies 

and 

{Tj : J e Gk} ^ {T, : J e J{k) for fc e {0, . . . , if}}. (42) 
Using these notations, we can give a more convenient formulation of Proposition [31 

Theorem 1 For any K e {0, . . . , N} , any norm \\.\\ and any t > 0, we have 

r{K)= U Tj, 

JeJ{K) 

where we remind that for any J <Z I and r > 0, TJ is defined by I123\) . and J{K) is defined by |^ip . 
As a consequence, (K) is closed and measurable. 

Proof. The case J = (and if = 0) is trivial because of the convention span(0) = {0} and J^{0) — {0}. 
Let us first prove that (if) = UjeG^ "^J ■ Using Proposition [31 

T^{K)^(\J T/J +/:||.||(r)= y (X/ + /:||.||(r)). 

The last equality above is a trivial observation. Using Lemma [21 this summarizes us 

I-(if)= U TJ. 
JeGK 

Using P^ . we deduce that 

{TJ : J e Gk} - {TJ : J G J{k) for fc e {0, . . . , if}}, 

and therefore, 

^^w = U u 

k=Q J£j(k) 

Moreover, for any k < K and J £ J{k), we can find Ji e J{K) such that Tj C Tj^. Using Lemma [2l we 
find that TJ C TJ_^ . Consequently, 

T-(if)= y TJ. 

Jej{K) 
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This completes the proof of the first statement. 

By Proposition [U 17 [K) is a finite union of closed measurable sets, hence it is closed and measurable 
as well. □ 



For any if = 0, . . . , iV, define the constants 8k and Ck as it follows: 

8k =^ max (43) 

J<^J{K) 

Ck E (44) 

where bj G [0, A] and Cj are the constants exhibited in Corollary [1] assertion (ii). Clearly, 

< < A. (45) 

In particular, 

Co=L"(£||.||(l)) andCAT =L"(/:/,(l)). (46) 
With J(K\ let us associate the family of subsets : 

U{K, k) =^ |(Ji, J2) e J{Kf such that dim(rj, n TjJ = fcj, (47) 

where if = 1, 2, . . . , iV and fc = 0, 1, . . . , X - 1. 

Notice that HiK, k) may be empty for some k. Consider (Ji, J2) G J^{K)^ such that 

dim(rj, +rj,) = k + (K-k) + (K-k) < N 
and k>2K - N. We see that 

niK, fc) ^ => k>2K -N. 

Conversely, 

fc < fcx =^ max{0, 2K -N} =^ n{K, k) = 0. (48) 

Notice that H(iV, k) = 0, for all fc = 0, . . . , A^-l and that for any if = 1, ... , iV-1, we have < fc^ < K-l. 
For if e {1, . . . , TV - 1} and k £ {kK, . . . , if - 1} let us define 

^'k k =^ max < 0, max 5j, j, >, (49) 

QK,k = Qj^J- (5^^) 

where Qji,,/2 and 5j^,j^ G [0, 3A] are as in Proposition [2l It is clear that if H{K,k) = then we find 
QK,k = and S'k k — 0. It follows that for any K = 1, . . . , N — 1 and any k — kK, . . . , if — 1 

< S'K.k < 3A. (51) 
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Last, define 

, if if = 

max Ak-1, Sk , max S'j^j. } ,iiO<K <N (52) 

,if K = N 
Using dUl) and ([5T|) . 

< < 3A. (53) 

All these constants, introduced between and ([5^ . depend only on the family {4>i)iei, the norms 
||.|| and /d, if and k. Their upper bounds using A only depend on ||.|| and fd- They are involved in the 
theorem below which provides a critical result in this work. 

Theorem 2 Let K ^ {0, . . . ,N}, the norms \\.\\ and fd, and be any. Let r > and 6 > tAk 

where Ak is defined in 15^) . The Lebesgue measure in of the set [K^ defined by 139^ satisfies 

CKT''-''{e~5KTf -e"" eo{K,T,e) < f {i^ {K)c\CfM) < CKT^-^'ie + SKTf, (54) 

where 

{0 ifX = 0oris: = 7V 

'i:'<7..(i)"-'(i+ik.4)' ifo<A-<« <») 
k=kK 

forCx, kx, QK.k, Sk and S'j^,^ defined by dH]), (gSl), ^U^, (gS]) and respectively. Moreover, ((i5|) . 

(|5ip and (|53p provide bounds on 6k, ^'k k '^"'^ ^k, respectively, which depend only on \\.\\ and fd, via A 
(see Lemma{^ (Hi)). 

Remark 6 We posit the assumptions of Theorem\^ Then asymptotically 



{r {K) n CfAO)) = Ck (J) + « (J) a. J ^ o. 



Proof. Using Theorem [U it is straightforward that 

I^{K)f^CfM = U {rJr^CsM) (56) 

and that 

L"(i-(x)n/:/,(0))=L"( y (r/n£;,(0))). (57) 

JeJ(K) 

When K — Q ox K — N , we have ^J'{K) = 1. Then, ((54)) is a straightforward consequence of (jST]) and 
Proposition [T] (the latter can be applied thanks to the assumption 6 > tAk and (HH)). 

The rest of the proof is to find relevant bounds for the right-hand side of (j57p under the assumption 
that Q <K <N. 
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Upper bound. By the definition of a measure, and then using Corollary [U it is found that 

h^r{K)n£fAO)) < E ^^rjnCfAO)) (58) 

J<EJ{K) 

JeJ{K) 

where the constants 6k and Ck are defined in (HS)) and (|44p . respectively. 

Lower bound. First we represent the right-hand side of ()56p as a union of disjoint subsets. Since J^{K) is 
finite, let us enumerate its elements as 

JiK)^{Ji,...,JM} where M = #(j(i^)). 
To simplify the expressions that follow, for any J we denote 

Bj = TJnCfM- (59) 

Then 

M 

JeJ(K) 1=1 
Consider the following decomposition: 

M 

\jBj^ = (i?,,,)u(Bj,\(i3j,nBjj)u...u(i?,,,A(u,'lT'(i?j, ni?^,J) 

M / i-1 

Since the last row is a union of disjoint sets, we have 

M M 

L~ ( U Bj^) = r: {Bj,) + ^L" ((S,. \ (U;.Z1(B;, n B.J) ). 

1=1 i=2 

Noticing that (U}=i(^./j ^1 BjJ^ C Bj^ entails that 

L" (Bj. \ ( U}Z1 (Bj^ n BjJ) ) = L" (B,,. ) - L" ( Upl (Bj^ HBjJ), Vz = 2, . . . , M. 

Hence 

( U - E (5./.) - E ( U (^^. n B.J) . (60) 

i=l 1=1 j=2 j = l 

Using successively (|59l) . assertion (ii) of Corollary [U ([33]), (|44p and 9 > tAk shows that 

M 

El" (B.J = E ^"{XJriCsAO)) 

»=i JeJ(K) 

> E t^iT"'"^(^-5^r)^ 

JeJ(K) 

> C;^r^-^(0-^xr)^, (61) 
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where the constants Sk and Ck are given in (|43p and (|44|) . respectively. 

Using the original notation ([59|) . each term, for i = 2, Af, in the last sum in ([60|) satisfies 

i — l i—1 i — 1 

L" ( y (i?,^. n B^, )) < 5] L" (s^^ n i?^, ) = ^ L" {Cf, {9) n rj;. n r^^ ) . (62) 

i=i j=i j=i 

Let us remind that dim(T/J = K for every i = 1, . . . , M and that by the definition of J{K) — see (PT|) — we 
have ^ 7j. if i ^ j. Proposition [5] can hence be applied to each term of the last sum: 

where kij = dim {Tj. H Tj.^. 

Then ^ leads to 

By rearranging the last sum in ((60)) and taking into account (|48| , we obtain 

^L"( y (i?^^. nS^J) < ^ fer^-^e + -5^,fcr)^ (63) 

2—2 j — 1 k—kx 

where j, and Qx./t are given in ([^5]) and ([50)) . respectively. 

Combining ([57)1 along with the original notations (|59p and then (|60p. ([ST)) and ([S5)) yields 

A/ 

t^(z-(i^)n/:;,(0)) = L"(yi?^J 

> CKr^-^(0-5Kr)^-£o(if,T,0), 
where eo(.) is as in the proposition. This finishes the proof. □ 



Remark 7 In the proof of this theorem we could notice (see (|60p . (|62p and ()58p ] t/iat 

E n/:/,(0)) - E E L"(£,,(0)nT/^nT/J 

Jej{K) k=kK {Jij2)en{K,k) 

< h''{riK)n£fM) (64) 

< E L"(r/n£/,(0)). 

These are the main approximations ofl^ {l'^ (K) D J0.fj^{9)) in the proof of the theorem. The precision of 
the bounds given in the theorem could be more accurate by improving the above inequalities. The loss of 
accuracy has however the same order of magnitude as the precision in the calculus o/lf (TJ n Cf^{9)) . 

The constants Ak, Sk and 5^ j, depend on (ipi)i^j and K. Using the uniform bound A exhibited in 
Lemma |T] (ii) in place of Sk and S'j^ f. leads to a more general but less precise result. 
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Corollary 2 Let K G {0, ...,Af}, the norms \\.\\ and fd, and {^ija^i, he any. Let r > and 9 > 3tA 
where A is derived in Lemma{J\ (ii) and depends only on fd and \\.\\. The set I'^ {K) defined by iSS^] 
satisfies 

CKT^-'^ie-Ar)^ -9^ eoiK,T,9) {K) r\ CfM) <Ckt^-^{9 + ^t)^, (65) 

where 

{0 , if K = or K 

E toG)""(l + 3Al)'= ,ifO<K<N. 
k = kK 

Moreover, for K — I, . . . , N — 1 and k = kn, . . . , K ~ 1, we have 

Qkm < #JiK){#JiK) - l)aiN - k)aik)i2S2f-''S^ (66) 

where 

a{n) is the volume of unit ball for the euclidean norm in M" (see equation (|78[) for details), 62 is defined 
in Lemma{^ (see equation pOp ) and S3 is such that 

\M2 < Ssfdiw) , WweR^. 

Proof. Equation ([65|) is obtained by inserting in (l54|) in Theorem [2] the uniform bounds on 6k, S'j^ u ^'^^ 
A/f given in (|45)) . ([5T|) and ([53]), respectively. 

The upper bound for QK,k is obtained as follows. Using ([50)) and ([77]) . we obtain 

E ]L"-'=((rj,nrjj^n/:||.||,(2<52)) i^-(rj, nTj,n/:/,(i)). 

{Ji,j2)&n(K,k) 

Moreover, 

L"-'' ((Tj, n TjJ-L n £11.11, (2^2)) = a{N - k){2S2f-\ 

t{Tj, nTj,nCf,ii)) < L^(r,, nr,, n £11.11,(^3)) = a(fc)(^3)^ 

and we obviously have 

m{K,k)<#J{K){#J{K)~l). 

□ 

The above corollary shows that the "quality" of the asymptotic as ^ depends on ||.||, fd and on 
the dictionary through the terms QK,k- The latter terms are bounded from above using (|66p and (j67p and 
they are clearly overestimated. Even though the bound we provide are very pessimistic, they depend only 
on fd and #/ and can be computed. 

Remark 8 Let us emphasize that "uniform" bounds in the spirit of Corollary [^ can be derived from 
Proposition [^ and Theorems [3 [^ and We leave this task to interested readers that need to compute 
easily the relevant bounds. 
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5 Sets of data yielding K-sparse solutions 

For any X £ {0, . . . , N} and r > 0, we denote 

(K) =^ {deR^ : val(Prf) = K} . (68) 
^From the definition of I'^ [K] in (|39p , it is straiglitforward tliat 

(K) {K)\r {K -1), VK (E {0,...,N}, (69) 
wliere we extend tlie definition of I"^ (K) with 

I"(-l) -0. 

Being the difference of two measurable closed sets, (K) is clearly measurable. Noticing also that 

(i^ - 1) C [K] (70) 

we get 

L" (p- [K] n Cf, [9)) = f (I- [K] n (^?)) - L" (I- (i^ - 1) n (0)) . (71) 
Combining these observations with Theorem [2] yields an important statement which is given below. 

Theorem 3 Let K G {0,...,iV}, the norms \\.\\ and fd, and , be any. Let 9 > and 9 > 

T max(Ax, Aa'-i) where is defined in (52]) . for k G {A' — 1,K}. The Lehesgue measure in of the 
set (K) defined in i68\) satisfies 

Ck r^-''{9-SKTf -e''e'oiK,T,9) < {V^ (K) n CjM) (72) 

< Ck T^-''{9 + SKTf +9''eiiK,T,9), (73) 



with 



e'o{K,T,9) = eo{K,T,9) + CK-i[-^) [1 + Sk-i 



where Ck for k G {K — 1,K} are defined by ^44^ : along with the extension C_i = 0, whereas Sq is as in 
Theorem\^ with the extension eQ{—l,T,9) = 0. 

Proof. By ([7T|) . we have 

L" {V [K] n Cf^{9)) < Upper bound^L" (l^ (K) f) Cf,{9))^ - Lower bound(]U'' {r {K ^ I) f) Cf,{9))^ 

L" {V [K] n CfM) > Lower bound(L" (j^ {K) n Cf,{9))) - Upper bound(L" (j^ (i^ - 1) n CfM)) 

where the relevant upper and lower bounds were derived in Theorem [5] Since L" (l'^ [K — 1) n £/^(0)) is 
negligible compared to L" {l'^ (K) D Cf^{9)), the bounds corresponding to this term arc introduced in the 
error functions eQ{K, r, 9) and ei{K, t, 9). □ 
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Remark 9 Let us emphasize that Remark\E[ is valid if we write (K) in place of T'^ {K). This gives 
the asymptotic of the L" {V (K) nCf^{9)) as ^ goes to 0. This observation may seem surprising. It only 
means that as far as ^ decreases, the chance to get a solution with sparsity strictly smaller than K is very 
small when compared to the chance of getting a sparsity K. 

Remark 10 In Section we adapted Theorem [H to get Corollary [H In the latter, the gap between the 
lower and upper bounds only depends on \\.\\, fd and Qk.k-i the latter depending on the dictionary in a 
controllable way. A similar adaptation of Theorem is easy. 

6 Statistical meaning of the results 

In this section we give a statistical interpretation of our main results, namely Theorem [3 and Theorem [31 

Proposition 4 Let f^ and \\.\\ be any two norms and be a dictionary in M.^ . For any K G 

{0,...,-/V}, let T > Q and 9 be such that 9 > tAk where is defined in i52\) . Consider a random 
variable d with uniform distribution on Cf^{9). Then 
Ck /r\N-K , eo{K,T,9) 



where eo{K, r, 9) is given in Theorem\^ equation (j55p . Moreover we have the following asymptotical result: 

'(™'"''''^«^iF(£ta))(i?) +°((?) j"?-"- 

Proof. Consider the set T"^ (K) defined by ([55)1 . We have 

ILf (X''' (K) n jC, (9)^ 
P (val(P,) <K)^F{der [K) n Cj, {9)) = ^v^^ (g)) ' 

since d is uniformly distributed on Cf^{9). The inequality result follow from Theorem [21 equation (j54p and 
uses the observation that (^Cf^{9)^ ~ 9^ (£^^(1)). 

The asymptotical result is a direct consequence of Remark [SI □ 



Remark 11 Notice that, as already noticed in (j46p . Cn — and the asymptotic in Proposition 

[2] reads for K ~ N 

P {^&\{Vd) <N) = l + o(l) as ^ ^ 0. 

9 

In fact a better estimate is easy to obtain in this particular case. We know indeed that for all d G M.^ , 
any solution ofVd involves an independent system of elements of {'>jji)i£i . (A sparser decomposition would 
otherwise exist.) Therefore we know that for all d G M.^ , va,l{Vd) < N . This yields 

P (val(Pd) <N)^1. (74) 
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Theorem 4 Letfdand\\.\\ he any two norms and he a dictionary in M.^ . For any K {0, . . . , N}, 

letr > and 9 he such that 9 > r max(Aif , A^-i) where Afc is defined in iSS^) . Consider a random variahle 
d with uniform distrihution on C (9) . Then we have 

Ck /t\N-k . r\K 



Ck 

s-{K,T,9) 



£'oiK,T,9) 
L"(^/.(l)) 

/or a?^c^ ei as defined in Theorem\^ and for 5k and Ck defined in J/^gp and 1^44^ , respectively. 
In particular, we have 

¥iv^l{P,)=K) = —j-^ -) +0 -) as -^0. (75) 



Proof. Consider the set (K) defined in (pS)) . We have 

P (val(7',) =K)=PideV^ (K) n (9)) = ^ ^^^l^^^^^^g^^'^^^^ 

since d is uniformly distributed on C (0) . The inequality result follows from Theorem [31 equation (j73p , 
andL"(/:^,(0))=L"(/:/,(l))0^. □ 



Remark 12 ^From ([75)1 and we see that 

P (val(7'd) = iV) = 1 + o(l) as^^O. 

o 

For any other K £ {0, . . . , N — 1}, P(val('Pc;) — K) goes to 0, as ^ —> 0. Moreover, we know how 
rapidly they go to 0. In particular, we know that P (val(7'd) = K — I) hecomes negligihle when compared to 
F{va\{Vd) = K), as^^O. 
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Notice that even though d is a random variable on a subset of M^, the value of our function va.l{Vd) is 
an integer larger than zero. We can also compute the expectation of val{'Pd)- 

N 

K=l 
N 



= ^ (P (val(Pd) <K)-F (val(Pd) < - 1)) 

K=l 

N N-1 

= ^KF (val(Prf) < if ) - ^ (/^ + 1)P (val(Pd) < K) 

K=Q K=Q 

= P {va\{Vd) < iV) - ^ P (val(Pd) < K) 

K=0 

N-1 

= TV - ^ P (val(P<j) < K) 



K=0 

where we used ([70|) and ([74|) . 

This yields the following Theorem. 

Theorem 5 Let fd and \\.\\ be any two norms and {'4'i)iei ^6 dictionary in M^. Let t > and 9 be 
such that 9 > T maxQ<K<N where Apc is defined in \5^] . Consider a random variable d with uniform 
distribution on C {9) . Then 



' Ck fr\N-K . r\K eoiK,T,9) 



where £q{K,t,9) is given in Theorem\^ equation (j55p . Moreover we have the following asymptotical result: 

E (val(Pd)) = TV ^^'^ I +0(1) as 1^0. 

7 Illustration: Euclidean norms for ||.|| and fd 

Consider the situation when both ||.|| and fd are the Euclidean norm on R^: 

N 

INI = /d = IMI2 where ||w||2 = V {u,u), with {u,v) ^'^utVi. (76) 

Noticing that the Euclidean norm is rotation invariant, for any vector subspace V C we have 

Pv^ (/:||.||.(r)) - n/:||.||,(r) = {ueV^: \\uh < r}. (77) 

The equivalent norm h and the constant A derived in Lemma [T] are simply 

h{u) = \\uh, "iu^V^, 
A = 1. 
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The constant Sy in assertion (ii) of Proposition [U defined by ((20|) . reads = 1. Then the inequality 
condition on 9 and r is simplified to 9 > t. 

The constant C in (|19p in the same proposition depends on K (the dimension of the subspace V) and 
reads (see [71 p. 60] for details) 



C = a(if)a(Ar - if) =^ C(ii:), 



where for any integer n > we have 



7r"/2 

= p. for r(n) - / e-^x--'dx. (78) 

r(n/2 + 1) 7o 



Here F is the usual Gamma function. Using that T(ji + 1) = nT(n), it comes 



47r 2 

^^^^ " A^(7v-if)r(i^)r(f) ^^'^^ 



From the preceding, the constants 6j and Cj in Corollary [T] read 



Sj = 1, VJC/, (80) 
Cj - C(AO, (81) 

where the expression of C{K) is given in (|79p . 

The norm <7 arising in (j32p in Proposition [5] reads 

5(u) = SUp{||wi||2 + IIW2II2, ||wi||2 + IIM3II2} 
= ||mi||2 + SUp{||u2||2, IIW3II2} 

where u — ui + U2 + is decomposed according to (j3ip . Then 

fd{u) = \\u\\2 = \\ui\\2 + \\u2h + \\u3h<Sj,j,g{u), yueW^ if Sj,j,^2 

The constants Sj^^j^ ^'^'^ QJ1J2 i'^ Proposition [2] read 

= 2 (82) 
= C(fc), (83) 

where C(fc) is defined according to ([TQ]) . 

For any fc = 1, . . . , A^, the constants 5}. and in ([43|)-(j44|) read 

4 = 1, 

Ck = C{k) ifJik). 

Clearly, :^J[K) depends on the dictionary 

The constants (5^^. and QK,k, introduced in (|49)) and (|50)). respectively, are 



(84) 



Qif.fe - C{k)m{K,k). (85) 
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Here again, ^Ti.(K, k) depends on the choice of dictionary and in any case, ^H{K, fc) = for k < ko 
(where ko is defined in (|48|) ). The constant in ([52|) is Ak = 2 and the inequahty ([53]) is satisfied. 
The main inequality in Theorem [2] now reads 

C{Kmj{K)} r^-^(0 - r)^- - eo{K,T,9) < L" (K) D CfM) 
where C{K) is defined by ([7^ and the error term eo{K, r, 9) is 

K-l 

eo{K,T,e) = - ^ C{k) #{n{K,k)} T^-'i9 + 2T)''. 

k—ko 

In order to provide the statistical interpretation in section [SI we notice that = a{N) for 

a(.) as given in ([75)) . and hence 

Tj-N/2 

8 Conclusion and perspectives 

In this paper, we derive lower and upper bounds for different quantities concerning the model (Vd)- 
Typically, the difference between the upper and the lower bound has an order of magnitude (^)^^^^^^ 
while the quantities which are estimated are propositional to {^)^~^ ■ The difference between the upper 
and lower bounds is made of 

• The terms 9±SvT which come from the inclusions Bq Q V"^ r]Cf^{9) C Bi, in the proof Proposition[T] 
This approximation is of the order (f)^^^^^- It may be possible to reach a larger order of magnitude 
(e.g. (§)^~^''"^) under the assumption that fd is regular away from (e.g. twice differentiable). 
This would permit to improve Proposition [1] and the theorems that use its conclusions. 

• A term of the form —9^eQ{K,T,9) could be added to the upper bound in (|54p . This term is not 
present because of the approximation made in ([55)1 . Such a term "—9'^eo{K, r, 6')" could be obtained 
by computing the size of the intersection of more than two cylinder-like sets in Proposition [2] (doing 
so we would also avoid the approximation in (|62p ) and by improving this proposition by bounding 
L" (Tj_^ n TJ^ n Cf^{9)) from below. This is probably a straightforward adaptation of the current 
proof of Proposition [21 

This improvement is possible but not necessary in this paper since (again) this approximation yields 
an error whose order of magnitude is (■^)^^^'^^. We can anyway not get a better order of magnitude 
unless the approximation mentioned in the previous item is not improved (i.e. more regularity is 
assumed for fd). 

Besides those aspects, several future developments can be envisaged: 
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• An important improvement would be to assume a more specialized form for the data distribution. 
One first step would be a distribution of the shape oc e"-'''*'"'^ which is continuous. In our opinion, 
one possible goal is to deal with a data distribution defined by a kernel. This is indeed one of the 
standard technique used in machine learning theory to approximate data distributions. 

• Another way of improvement is to adapt those results to the context of infinite dimensional spaces. 
This adaptation might not be trivial since (for instance) there is no Lebesgue measure in those spaces. 

• We are also preparing a paper where a similar analysis is performed for the Basis Pursuit Denoising 
(i.e. regularization) with the same asymptotic. It will clearly show what is in common and what 
are the differences between £o and regularization. 

• Performing a similar analysis for the Orthogonal Matching Pursuit would, of course, be a interesting 
and complementary result. 

• In a forthcoming work, we develop the theory in the context of orthogonal bases instead of general 
dictionaries (frames) . This simplification of the hypotheses simplifies a lot the formulas of the current 
paper and illustrate it. 
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